Multi-Scale Audio Spectrogram Transformer for Classroom Teaching Interaction Recognition
نویسندگان
چکیده
Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom through image data limited by interference complex teaching scenarios. However, audio within are characterized significant student–teacher interaction. This study proposes a multi-scale spectrogram transformer (MAST) speech scene classification algorithm constructs interactive dataset to achieve teacher–student recognition in process. First, original signal sampled pre-processed generate multi-channel spectrogram, which enhances representation features compared with single-channel features; Second, order efficiently capture long-range global context globally modeled multi-head self-attention mechanism MAST, feature resolution reduced during extraction continuously enrich layer-level while reducing model complexity; Finally, further combination time-frequency enrichment module maps final output class map, enabling accurate category recognition. The experimental comparison MAST carried out on public environment self-built interaction datasets. Compared previous state-of-the-art methods datasets AudioSet ESC-50, its accuracy has been improved 3% 5%, respectively, reached 92.1%. These results demonstrate effectiveness field general smart domain.
منابع مشابه
Multi-scale Enveloping Spectrogram for Bearing Defect Detection
This paper presents a new signal processing technique for bearing defect detection, called Multi-Scale Enveloping Spectrogram (MUSENS). The technique decomposes vibration signals measured on rolling bearings into different scales by means of a continuous wavelet transform (CWT). The envelope signal in each scale is then calculated from the modulus of the wavelet coefficients. Subsequently, Four...
متن کاملUniversität Augsburg Audio Brush : Editing Audio in the Spectrogram
A tool for editing audio signals in the spectrogram is presented. It allows manipulating the spectrogram of a signal at any chosen time-frequency resolution directly and to reconstruct the edited signal in HiFi quality – a capability that is usually not possible with the Fourier or wavelet transformation. Image processing and computer vision methods are applied to the spectrogram in order to id...
متن کاملUniversität Augsburg Audio Brush : Smart Audio Editing in the Spectrogram
Starting with a novel audio analysis and editing paradigm, a set of new and adaptive audio analysis and editing algorithms in the spectrogram are developed and integrated into a smart visual audio editing tool in a “what you see is what you hear” style. At the core of our algorithms and methods is a very flexible audio spectrogram that goes beyond FFT and Wavelets and supports manipulating a si...
متن کاملAudio-visual Interaction in Model Adaptation for Multi-modal Speech Recognition
This paper investigates audio-visual interaction, i.e. inter-modal influences, in linear-regressive model adaptation for multi-modal speech recognition. In the multi-modal adaptation, inter-modal information may contribute the performance of speech recognition. Thus the influence and advantage of intermodal elements should be examined. Experiments were conducted to evaluate several transformati...
متن کاملAudio Spectrogram Representations for Processing with Convolutional Neural Networks
One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Future Internet
سال: 2023
ISSN: ['1999-5903']
DOI: https://doi.org/10.3390/fi15020065